NETOBSERV-2443 fix bug, improve cleanup and writing files #404

memodi · 2025-10-14T21:50:28Z

Description

NETOBSERV-2443 fix bug, improve cleanup and writing files

With the help of Claude, I was able to identify the flakiness coming from pty and made bunch of improvements as below:

Complete output capture - All lines captured, no race conditions
Proper timeout handling - API calls respect polling context timeouts
Reliable cleanup - Ignores SIGHUP, completes deletion
Absolute paths for file reads
Cleanup of output/flow directory after every test, so next test won't read from the same file.
Use the OCP-XXXX and test label combination for the output files of collector and cleanup cmd

Made several runs, now CLI tests are much stable.

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

openshift-ci · 2025-10-14T21:50:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jotak for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2025-10-14T21:54:37Z

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 13.82%. Comparing base (1654142) to head (d8a6355).

Files with missing lines	Patch %	Lines
e2e/integration-tests/cli.go	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
- Coverage   13.84%   13.82%   -0.02%     
==========================================
  Files          18       18              
  Lines        2731     2734       +3     
==========================================
  Hits          378      378              
- Misses       2329     2332       +3     
  Partials       24       24

Flag	Coverage Δ
unittests	`13.82% <0.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
e2e/integration-tests/cli.go	`0.00% <0.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

memodi · 2025-10-14T22:54:39Z

/test ?

openshift-ci · 2025-10-14T22:54:42Z

@memodi: The following commands are available to trigger required jobs:

/test images

/test integration-tests

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

memodi · 2025-10-14T22:55:46Z

/test integration-tests

memodi · 2025-10-17T13:02:41Z

/test integration-tests

memodi · 2025-10-17T15:19:34Z

integration tests are failing because for some reason CI cluster is taking too long to pull images.

/test integration-tests

memodi · 2025-10-21T16:04:00Z

/test integration-tests

memodi · 2025-10-24T15:42:55Z

/test integration-tests

- Increase waitDaemonset timeout from 50s to 5 minutes (30×10s) * CI environments often have slow image pulls * Previous timeout was too aggressive for registry operations - Add comprehensive diagnostic output on pod startup failure: * Pod status with node placement (get pods -o wide) * Recent events to identify ImagePullBackOff, etc * Pod event details from describe output * Daemonset logs if containers started This helps diagnose ContainerCreating issues in CI where pods fail to start due to image pull problems or resource constraints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CI runs showed 4/6 pods ready with 5 minute timeout, indicating image pulls need more time. Increasing to 10 minutes (60×10s) to accommodate slower CI registry pulls and pod scheduling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

In E2E test mode, the bash script's waitDaemonset() could exit with error after 10 minutes while the Go test's isDaemonsetReady() was still polling. This created a race where: 1. Go test calls StartCommand() which runs bash script async 2. Bash script calls waitDaemonset() and waits 10 mins 3. Go test calls isDaemonsetReady() and waits 10 mins 4. If bash times out first, it calls exit 1, killing the process 5. Go test is left polling a dead command Solution: When isE2E=true, skip the bash-level wait since the Go test framework handles pod readiness checking via isDaemonsetReady(). For manual CLI usage (isE2E=false), the wait still runs as before to provide user feedback. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Tests were failing because: 1. Commands ran with --max-time=1m in foreground mode 2. After 1 minute, capture finished and auto-cleanup ran 3. Cleanup deleted the daemonset 4. isDaemonsetReady() was polling for a deleted daemonset 5. Test failed with context deadline exceeded Using --background mode prevents automatic cleanup when the capture finishes, allowing the test to verify daemonset privilege settings before cleanup runs. Also, Check for CLI is running instead of just daemnset. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

openshift-ci · 2025-10-24T21:43:01Z

@memodi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/integration-tests	`553e707`	link	true	`/test integration-tests`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

fix bug, improve cleanup and writing files

d8a6355

memodi added the no-qe label Oct 14, 2025

add a trap to catch SIGHUP and improvements for pty

4e6314d

memodi requested a review from jpinsonneau October 16, 2025 21:00

add filePrefix

b460640

memodi requested review from Amoghrd and oliver-smakal October 16, 2025 21:08

memodi added 2 commits October 16, 2025 17:13

linter

5729e05

fix the artifact dir

86d43d0

memodi and others added 4 commits October 24, 2025 11:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

memodi commented Oct 14, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Oct 14, 2025

Uh oh!

codecov bot commented Oct 14, 2025

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

openshift-ci bot commented Oct 14, 2025

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 21, 2025

Uh oh!

memodi commented Oct 24, 2025

Uh oh!

openshift-ci bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

Are you sure you want to change the base?

NETOBSERV-2443 fix bug, improve cleanup and writing files #404

Conversation

memodi commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Dependencies

Checklist

Uh oh!

openshift-ci bot commented Oct 14, 2025

Uh oh!

codecov bot commented Oct 14, 2025

Codecov Report

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

openshift-ci bot commented Oct 14, 2025

Uh oh!

memodi commented Oct 14, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 17, 2025

Uh oh!

memodi commented Oct 21, 2025

Uh oh!

memodi commented Oct 24, 2025

Uh oh!

openshift-ci bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

memodi commented Oct 14, 2025 •

edited

Loading